Automatic Categorization Tool for Open Software Repositories

نویسندگان

  • Shinji Kawaguchi
  • Pankaj K. Garg
  • Makoto Matsushita
  • Katsuro Inoue
چکیده

The world of Open Source software has demonstrated the remarkable appeal of communal software development. Large number of software projects can leverage, reuse, and coordinate their work through Internet and web-based technology. For example, SourceForge currently hosts about sixty thousand software systems. Similar strategies have been suggested for corporate software development, through notions like Corporate Source and Progressive Open Source [6, 7] When used in a corporate setting, infrastructures for project information sharing present new opportunities. For example, one would like to know all projects that have something in common, so that the project groups can collaborate and share their work. With thousands of projects, manually locating related projects can be difficult. Hence, we propose to use automatic software categorization to find clusters of related software projects, using only the source code from projects. Our experiments with a small set of C programs demonstrates potential for automatic categorization of software systems without human aid.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Categorization of Software Modules

The world of software has demonstrated the remarkable appeal of communal software development. Large number of software projects can leverage, reuse, and coordinate their work through internet and web-based technology. For example, Source-Forge currently hosts about sixty thousand software systems, similar strategies have suggested for corporate software development. With thousands of projects,...

متن کامل

A text categorisation tool for open source communities based on semantic analysis

Open source software (OSS) projects are supported by communities interacting through software repositories and mailing lists. Thousands of contributors participate in the development of the projects although they rarely meet each other. The result is a huge archived repository with thousands of questions, answers and contributions usually difficult to explore. We propose a tool based on semanti...

متن کامل

Mining Software Repositories for Defect Categorization

Early detection of software defects is very important to decrease the software cost and subsequently increase the software quality. Success of software industries not only depends on gaining knowledge about software defects, but largely reflects from the manner in which information about defect is collected and used. In software industries, individuals at different levels from customers to engi...

متن کامل

Approaches for Categorization of Reusable Software Components

Reuse repositories manager manages the reusable software components in different categories and needs to find the category of reusable software components. In this paper, we have used different pure and hybrid approaches to find the domain relevancy of the component to a particular domain. Probabilistic Latent Semantic Analysis (PLSA) approach, LSA, Singular Value Decomposition (SVD) technique,...

متن کامل

Hierarchical Categorization of Open Source Software by Online Profiles

The large amounts of freely available open source software over the Internet are fundamentally changing the traditional paradigms of software development. Efficient categorization of the massive projects for retrieving relevant software is of vital importance for Internet-based software development such as solution searching, best practices learning and so on. Many previous works have been cond...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003